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^ Identifying clusters of similar objects in data plays a significant role in a wide range of 

^ applications. As a model problem for clustering, we consider the densest fc-disjoint-clique 

I— I problem, whose goal is to identify the collection of k disjoint cliques of a given weighted 
complete graph maximizing the sum of the densities of the complete subgraphs induced by 

O these cliques. In this paper, we establish conditions ensuring exact recovery of the densest 

4^ k cliques of a given graph from the optimal solution of a particular semidefinite program. 

d In particular, the semidefinite relaxation is exact for input graphs corresponding to data 

S consisting of k large, distinct clusters and a smaller number of outliers. 

^ This approach also yields a semidefinite relaxation for the biclustering problem with 

^ similar recovery guarantees. Given a set of objects and a set of features exhibited by these 

^ objects, biclustering seeks to simultaneously group the objects and features according to 
their expression levels. This problem may be posed as partitioning the nodes of a weighted 

CO bipartite complete graph such that sum of the densities of the resulting bipartite complete 

CN subgraphs is maximized. As in our analysis of the densest /c-disjoint-clique problem, we 

^ show that the correct partition of the objects and features can be recovered from the 

1 — I optimal solution of a semidefinite program in the case that the given data consists of 

^ several disjoint sets of objects exhibiting similar features. 

1 Introduction 



The goal of clustering is to partition a given data set into groups of similar objects, called 
clusters. Clustering is a fundamental problem in statistics and machine learning and plays 
a significant role in a wide range of applications, including information retrieval, pattern 
recognition, computational biology, and image processing. The complexity of finding an 
optimal clustering depends significantly on the measure of fitness of a proposed partition, 
but, in general, is an intractable combinatorial problem. For this reason, heuristics are 

* Supported in part by NSERC (Natural Science and Engineering Research Council of Canada) 
^' Institute for Mathematics and its Applications, College of Science and Engineering, University of Min- 
nesota, 207 Church Street SE, 400 Lind Hall, Minneapolis, Minnesota, 55455, U.S.A., bpames@gmail.com 



1 



used to cluster data in most practical applications. Unfortunately, although much empiri- 
cal evidence exists for the usefulness of these heuristics, few theoretical guarantees ensuring 
the quality of the obtained partition are known, even for data containing well separated 
clusters. For a recent survey of clustering techniques and heuristics, see [5]. In this paper, 
we establish conditions that ensure that the optimal solution of a particular convex opti- 
mization problem yields a correct clustering under certain assumptions on the input data 
set. 

Our approach to clustering is based on partitioning the similarity graph of a given set of 
data. Given a data set S and measure of similarity between any two objects, the similarity 
graph Gs is the weighted complete graph with nodes corresponding to the objects in the 
data set and each edge ij having weight equal to the level of similarity between objects i 
and j. For this representation of data, clustering the data set 5* is equivalent to partitioning 
the nodes of Gs into disjoint cliques such that edges connecting any two nodes in the same 
clique have significantly higher weight than those between different cliques. Therefore, a 
clustering of the data may be obtained by identifying dense, in the sense of having large 
average edge weight, subgraphs of Gs. 

We consider the densest /c-partition problem as a model problem for clustering, . Given 
a weighted complete graph K = {V,E,W) and integer k G {1,. . . , |^|}, the densest k- 
partition problem aims to identify the partition of V into k disjoint sets such that the 
sum of the average edge weights of the complete subgraphs induced by these cliques is 
maximized. Unfortunately, the densest fc-partition problem is NP-hard, since it contains 
the minimum sum of squared Euclidean distance problem, known to be NP-hard [25] , as a 
special case. In Section [2} we consider the related problem of finding the set of k disjoint 
complete subgraphs maximizing the sum of their densities. We model this problem as 
a quadratic program with combinatorial constraints and relax to a semidefinite program 
using matrix lifting. We establish that the optimal solution of this semidefinite relaxation 
coincides with that of the original combinatorial problem for certain program inputs. In 
particular, we show that the input graphs for which the relaxation is exact include the set 
of graphs with edge weights concentrated on a particular collection of disjoint subgraphs, 
and provide a general formula for the clique sizes and number of cliques that may be 
recovered. 

In Section |3} we establish similar results for the biclustering problem. Given a set 
of objects and features, biclustering, also known as co-clustering, aims to simultaneously 
group the objects and features according to their expression levels. That is, we would like 
to partition the objects and features into groups of objects and features, called biclusters, 
such that objects strongly exhibit features within their bicluster relative to the features 
within the other biclusters. Hence, biclustering differs from clustering in the sense that it 
does not aim to obtain groups of similar objects, but instead seeks groups of objects similar 
with respect to a particular subset of features. Applications of biclustering include identi- 
fying subsets of genes exhibiting similar expression patterns across subsets of experimental 
conditions in analysis of gene expression data, grouping documents by topics in document 
clustering, and grouping customers according to their preferences in collaborative filtering 
and recommender systems. For an overview of the biclustering problem, see [71 113] . 
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As a model problem for biclustering, we consider the problem of partitioning a bipartite 
graph into dense disjoint subgraphs. If the given bipartite graph has vertex sets correspond- 
ing to sets of objects and features with edges indicating expression level of each feature by 
each object, each dense subgraph will correspond to a bicluster of objects strongly exhibit- 
ing the contained features. Given a weighted bipartite complete graph K = ((f/, K), W) 
and integer k G {1, . . . , min{|[/|, |l^|}}, we seek the set of k disjoint bipartite complete 
subgraphs with sum of their densities maximized. We establish that his problem may be 
relaxed as a semidefinite program and show that, for certain program instances, the correct 
partition of K can be recovered from the optimal solution of this relaxation. In particular, 
this relaxation is exact in the special case that the edge weights of the input graph are 
concentrated on some set of disjoint bipartite subgraphs. When the input graph arises 
from a given data set, the relaxation is exact when the underlying data set consists of 
several disjoint sets strongly exhibiting nonoverlapping sets of features. 

Our results build upon those of recent papers regarding clusterability of data. These 
papers generally contain results of the following form: if a data set is randomly sampled 
from a distribution of "clusterable" data, then the correct partition of the data can be 
obtained efficiently using some heuristic, such as the fc-means algorithm [T], spectral 
clustering |2T1 [201 EOl II] , or convex optimization |3l[ISl[23]- We obtain similar guarantees. If 
the underlying data set consists of several sufficiently distinct clusters or biclusters, then the 
correct partition of the data can be recovered from the optimal solution of our relaxations. 
We model this ideal case for clustering using random edge weight matrices constructed 
so that weight is, in expectation, concentrated heavily on the edges of a few disjoint 
subgraphs. We will establish that this random model for clustered data contains those 
previously considered in the literature and, in this sense, our results are a generalization 
of these earlier theoretical guarantees. 

More generally, our results follow in the spirit of, and borrow techniques from, recent 
work regarding sparse optimization and, in particular, the nuclear norm relaxation for rank 
minimization. The goal of matrix rank minimization is to find a solution of minimum rank 
of a given linear system, i.e. to find the optimal solution X* G R™^"- of the optimization 
problem minjrankX : A{X) = b} for given linear operator A : R*"^" — ). Rp and vec- 
tor b e TV. Although this problem is well-known to be NP-hard, several recent papers 
[271 13 121 [m [261 [HI [211 [21] have established that, under certain assumptions on A and b, the 
minimum rank solution is equal to the optimal solution of the convex relaxation obtained 
by replacing rankX with the sum of the singular values of X, the nuclear norm ||Ar||*. 
This relaxation may be thought of as a matrix analogue of the ii norm relaxation for the 
cardinality minimization problem, and these results generalize similar recovery guarantees 
for compressed sensing (see [TT l [T m [T2]). For example, the nuclear norm relaxation is exact 
with high probability if ^ is a random linear transform with matrix representation having 
i.i.d. Gaussian or Bernoulli entries and b = A{Xq) is the image of a sufficiently low rank 
matrix Xq under A. We prove analogous results for an instance of rank constrained opti- 
mization. To identify the densest k complete subgraphs of a given graph, we seek a rank-k 
matrix X maximizing some linear function of X, depending only on the edge weights W 
of the input graph subject to some linear constraints. We show that the optimal rank-/c 
solution is equal to that obtained by relaxing the rank constraint to the corresponding nu- 
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clear norm constraint if the matrix W is randomly sampled from a probability distribution 
satisfying certain assumptions. 



2 A semidefinite relaxation of the densest /c-disjoint- 
clique problem 

Given a graph G = {V, E) , a. clique of G is a pairwise adjacent subset of V. That is, C C ^ 
is a clique of G if ij e E for every pair of nodes i,j e G. Let — (V, E, W) be a complete 
graph with vertex set V = {1,2, . . . , N} and nonnegative edge weights Wij e [0, 1] for all 
i,j E V. A k- disjoint- clique subgraph of K]\f is a subgraph of i^^v consisting of k disjoint 
complete subgraphs; i.e. the vertex sets of each of these subgraphs is a clique. For any 
subgraph H of K^, the density of H, denoted dn, is the average edge weight incident at 
a vertex in H: 

ijeE{H) I V ;i 

The densest k- disjoint- clique problem concerns choosing a fc-disjoint-clique subgraph of Xjv 
such that the sum of the densities of the subgraphs induced by the cliques is maximized. 
Given a A;-disjoint-cliquc subgraph with vertex set composed of cliques Gi, . . . ,Gk, the sum 
of the densities of the subgraphs induced by the cliques is equal to 

t'^o,o,^t'^, (2.1) 

i=l i=l 

where is the characteristic vector of Q. In the special case that Ci, . . . ,Gk defines a 
partition of V and Wij = 1 — ||x('^ — x^-'^lp for a given set of N vectors {x'^^\ . . . , x*^^)} in 
R" with maximum distance between any two points at most one, we have 

e=i e=i ^ ^1 \ieC(jeCe 




k 

= 7V-2^ J]||x»-c(^)||2, 

since Yl'e=i ~ ^^^^ choice of W, where c^^^ = X^ieQ ^'■'Vl^^l center of the 

vectors assigned to G^ for all £ = 1, . . . ,k. For this choice of W, the densest k-partition 
problem, i.e. finding a partition Gi, . . . ,Gk of V such that the sum of densities of the 
subgraphs induced by Ci, . . . , is maximized, is equivalent to finding the partition of V 
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such that the sum of the squared Euchdean distances 



/({xW, . . . , xW}, {Ci, ...,C,}) = J2Y1 11^^^^ - ^^'^11' (2.2) 

i=l ii^Ci 

from each vector x*^*^ to its assigned cluster center is minimized. Unfortunately, minimizing 
/ over all potential partitions of V is NP-hard and, thus, so is the densest fc-partition 
problem (see [25]). It should be noted that the complexity of the densest fc-disjoint-clique 
subgraph problem is unknown, although the problem of minimizing / over all fc-disjoint- 
clique subgraphs has the trivial solution of assigning exactly one point to each cluster and 
setting all other points to be outliers. 

If we let X be the n x k matrix with ith column equal to Vj/||vj|| we have 

iGic,)='Tr(X'^WX). 



i=l 



We call such a matrix X a normalized fc-partition matrix. That is, X is a normalized k- 
partition matrix if the columns of X are the normalized characteristic vectors of k disjoint 
subsets of V. We denote by npm{V,k) the set of all normalized fc-partition matrices of 
V. We should note that the term normalized fc-partition matrix is a slight misnomer; the 
columns of X G npm{V, k) do not necessarily define a partition of V into k disjoint sets but 
do define a partition of V into the k disjoint sets given by the columns :),..., X{k, :) 
of X and their complement. Using this notation, the densest /c-disjoint-clique problem may 
be formulated as the quadratic program 

max{Tr (X'^WX) : X G npm{V, k)}. (2.3) 

Unfortunately, quadratic programs with combinatorial constraints are NP-hard in general. 



The quadratic program (2.3 ) may be relaxed to a rank constrained semi definite program 
using matrix lifting. We replace each column Xj of X with a rank-one semidefinite variable 
x,x!r to obtain the new decision variable 



X = ^x,xf. (2.4) 



1=1 



The new variable X has rank exactly equal to k since the summands Xjxf are orthogonal 
to each other. Moreover, since ||xj||i = ^/rl where rj is equal to the number of nonzero 
entries of Xj and each row of X has at most one nonzero entry, the matrix X has row sums 
equal to one for each vertex in the subgraph of defined by X and are zero otherwise. 



Therefore, we may relax (2.3) as the rank constrained semidefinite program 

max{TT{WX) : Xe < e,rankX = k,X >0,X hO] (2.5) 
Here " ^ " denotes the partial order on the cone oi N x N symmetric positive semidefinite 



5 



matrices defined hj U ^ V iiU — V and e denotes the all ones vector in . The 



N 



nonconvex program (2.5) may be relaxed further to a semidefinite program by replacing 
the nonconvex constraint rank (X) = k with the linear constraint Tr (X) 



k: 



max{Tr (VTX) : Xe < e,TrX = A;, X > 0, X ^ O}. 



(2.6) 



Note that a fc-disjoint-clique subgraph with vertex set composed of disjoint cliques Ci, . . . , 
Ck defines a feasible solution of (2.6) with rank exactly equal to k and objective value 



equal to (2.1 ) by 



T 



(2.7) 



where v,- is the characteristic vector of d for all 



k. 



This feasible solution is 
exactly the lifted solution corresponding to the chques {Ci, . . . ,Ck} given by (2.4). We 



should point out that the constraints of (2.6) are similar to those of the semidefinite 



relaxation used to approximate the minimum sum of squared Euclidean distance partition 
by Peng and Wei in ^25j, although with different derivation. 



The relaxation (2.6) may be thought of as a nuclear norm relaxation of (2.5). Indeed 



since the eigenvalues and singular values of a positive semidefinite matrix are identical, 
every feasible solution X satisfies 



N 



Tr(X) = 5^a,(X) = ||X| 



i=l 



Moreover, since every feasible solution X is symmetric and has row sums at most 1, we 
have 

1 — oo < 1 



for every feasible X. This implies that every feasible X satisfies 
v/||X||i||X||oo (see [Ml Corollary 2.3.2]). Since ||X 
on the set {X : ||X|| < 1} (see, for example, [23 Theorem 2.2]), (2.6) is exactly the 



X|| < 1 since ||X]| < 
is the convex envelope of rank (X) 



relaxation of (2.5) obtained by underestimating rank with the nuclear norm. Many recent 
results have shown that the minimum rank solution of a set of linear equations ^(X) = b 
is equal to the minimum nuclear norm solution, under certain assumption on the linear 



operator A. We would like to prove analogous results for the relaxation (2.6). That is, we 



would like to identify conditions on the input graph that guarantee recovery of the densest 



fc-disjoint-clique subgraph by solving (2.6). 



Ideally, a clustering heuristic should be able to correctly identify the clusters in data 
that is known a priori to be clusterable. In our graph theoretic model, this case corresponds 
to a graph Gs = (V, E, W) admitting a fc-disjoint-clique subgraph with very high weights 
on edges connecting nodes within the cliques and relatively low weights on edges between 
different cliques. We focus our attention on input instances for the densest fc-disjoint- 
clique problem that are constructed to possess this structure. Let K* be a fc-disjoint-clique 
subgraph of with vertex set composed of disjoint cliques Ci, C2, . . . , Ck- We consider 
random symmetric matrices W e with entries sampled independently from one of two 
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distributions fii, as follows: 



• For each q = 1, . . . , fc, the entries of each block of Wcq,Cq are independently sampled 
from a probability distribution Qi satisfying = -^[W^ij] = a and Wij G [0, 1] 
for all i,j G Cg. 

• All remaining entries of W are independently sampled from a probability distribution 
^2 satisfying E[Wij] = E[Wji] = (3 and Wij G [0, 1] for all (i, j) G x 1/)\U^^i(Cg x 

That is, if the nodes i,j are in the same planted clique we sample the random variable 
Wij from the probability distribution Vti with mean a; otherwise, we sample Wij from the 
distribution ^2 with mean (3. We say that such random matrices W are sampled from 
the planted duster model. We should note the planted cluster model is a generalization 
of the planted fc-disjoint-clique subgraph model considered in [3], as well as the stochastic 
block/probabilistic cluster model considered in |T9l [231 EO]- Indeed, the stochastic block 
model is generated by independently adding edges within planted dense subgraphs with 
probability p and independently adding edges between cliques with probability q for some 
p > q. The planted fc-disjoint-clique subgraph model is simply the stochastic block model 
in the special case that p = 1. Therefore, choosing Qi and Q2 to be Bernoulli distributions 
with probabilities of success p and q, respectively, yields W sampled from the stochastic 
block model. 

The following theorem describes which partitions {Ci, C2, . . . , C^+i} of V yield random 
symmetric matrices W drawn from the planted cluster model such that the corresponding 
planted /c-disjoint-chque subgraph K is the densest fc-disjoint-clique subgraph and can be 



found with high probability by solving (2.6) 



Theorem 2.1 Suppose that vertex sets Ci, . . . , define a k- disjoint- clique subgraph K* 
of the complete graph = (y,E) on N vertices and letCk+i '■= V\{^i=iCi). LetVi := \Ci\ 
for all i = 1, . . . , k + 1, and let f = minj=i_...^fc '"i- Let W G be a random symmetric 
matrix sampled from the planted cluster model according to distributions fli and fl2 with 
means a and (3, respectively, satisfying 

«>/3(Wi+2(l- W))' (2-8) 
where 6ij is the Kronecker delta function defined by 6ij = 1 if i = j and otherwise. Let 



X* be the feasible solution for (2.6) corresponding to Ci,...,Ck defined by (2.7). Then 



there exist scalars Ci,C2, pi, P2 > such that if 

Ti < c,{a - Pff^ (2.9) 

for all i = 1, . . . ,k, and 

/ fe+i \ 

PiU^rJ + p^y/N + (3rk+i < C2{a - (3)f (2.10) 



7 



then X* is the unique optimal solution for (2.6), and K* is the unique maximum density 
k- disjoint- clique subgraph of Kn corresponding to W with probability tending exponentially 
to 1 as f — 7- cxD. 



Note that the condition (3.7) imphes that a > (3 ii r^+i 
That is, if {Ci, . . 
relaxed to a > /3. 



and a > 2(3 otherwise. 
, Ck] defines a partition of V then the restriction that a > 2(3 can be 



The condition (2.10) cannot be satisfied unless N = 0{f ) and r^+i = 0(f). We now 



provide a few examples of ri, . . . , satisfying the hypothesis of Thorem 2.1 



Suppose that we have k cliques Ci, . . . , of size ri 



r2 



N'. Then 



(2.10) implies that we may recover the fc-disjoint-clique subgraph corresponding to 
Ci, . . . , Cfc if = 0{N^/'^). Since the cliques Ci, . . . ,Ck are disjoint and contain ^1{N) 
nodes, we must have e = 2/3. Therefore, our heuristic may recover 0(iVi/3) planted 
cliques of size N'^^^. 

On the other hand, we may have cliques of different sizes. For example, suppose that 
we wish to recover ki cliques of size N^^^ and /c2 smaller cliques of size iV^/^. Then 



the right-hand side of (2.10) must be at least 



n{{k, + k2){k,N'/'' + k2N'/^)). 

Therefore, we may recover the planted cliques provided that ki = 0(A^^/^) and 
k2 = 0(Ari/3). 



We should point out that a significant consequence of our more general model for clus- 
tered data is that our recovery guarantees are less powerful than those existing in the 
literature. The bound on the minimum size of the planted clique recoverable by the relax- 
ation (2.6) provided by Theorem 2.1 is weaker (n(iV^/^) versus n{N^/^)) than that given 



in [31 [ini |23] but matches that of [30] . However, among the existing recovery guarantees in 



the literature, few consider noise in the form of diversionary nodes. Our relaxation (2.6) is 
exact for input graphs containing up to 0(f) noise nodes, far fewer than the bound, O(f^), 
provided by [3J. 



3 A convex relaxation of the densest /c-disjoint-biclique 
problem 

Given a bipartite graph G = {(U,V), E), a pair of disjoint independent subsets U' C U, 
V^' C is a biclique of G if the subgraph of G induced by ([/', V) is complete bipartite. 
That is, ([/', V) is a biclique of G if uv & E for all u G U',v G V. A k-disjoint-biclique 
subgraph of G is a subgraph of G with vertex set composed of k disjoint bicliques of 
G. Let Km,n = {{U,V), E,W) be a weighted complete bipartite graph with vertex sets 
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U = {1,2,..., M}, V = {1,. . . ,N} with matrix of edge weights W G [0, 1]^^^. We are 
interested in identifying the densest fc-disjoint-bichque subgraph of Km,n with respect to 
W . We define the density of a subgraph H = {U' , V , E') of Km,n to be the total edge 
weight incident at each vertex divided by the square root of the number of edges from U' 
to V: 

dH = ^= Yl (3.1) 



Note that the density of H, as defined by (3.1), is not necessarily equal to the average 



edge weight incident at a vertex of H, since the square root of the number of edges is 
not equal to the total number of vertices if |?7'| 7^ |V^'| or if is not complete. The goal of 
the densest k-disjoint-biclique problem is to identify a set of k disjoint bicliques of Km,n 
such that the sum of the densities of the complete subgraphs induced by these bicliques 
is maximized. That is, we want to find a set of k disjoint bicliques, with characteristic 
vectors (ui, Vi), . . . , (u^, v^), maximizing the sum 



\- (3.2) 



1=1 " ' 

As in our analysis of the densest fc-disjoint-clique problem, this problem may be posed as 
the nonconvex quadratic program 

max{Tr (X^WY) : X E npm{U), Y E npm{V)}. (3.3) 

We symmetrize the weight matrix W as 



W 



W 




and relax to the rank constrained semidefinite program 

max ^Tr (WZ) 

(3.4) 



s.t. ZujjB < e, Zvv^ ^ e. 



rank {Zu^jj) = k, rank {Zyy) = k, 

z >o, zyo, 

where Zu^u and Zyy are the blocks of Z with rows and columns indexed by U and V 
respectively. Replacing the nonconvex rank constraints with trace constraints yields the 
semidefinite relaxation 



max iTr (WZ) 

(3.5) 



s.t. Zu^jje < e, Zyye < e 
T:t\Zu,u), Tt(Zv,v) = 

Z>0, z^o. 



As in our analysis of the densest /c-disjoint-clique problem, we would like to identify sets of 
program instances of the /c-disjoint-biclique problem that may be solved using the semidef- 
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inite relaxation (3.5). As before, we consider input graphs where it is known a priori that 
a fc-disjoint-bichque subgraph with large edge weights, relative to the edges of its com- 
plement, exists. We consider random program instances generated as follows. Let G* be 
a fc-disjoint-biclique subgraph of Km,n with vertex set composed of the disjoint bicliques 
(f/i, ),..., (f/fc, Vfc). We construct a random matrix W G R*^^^ with entries sampled 
independently from one of two distributions Qi, Q2 as follows. 

• If M G f/j, f G for some i G {1, . . . ,k}, then we sample Wuv from the distribution 
Qi, with mean a. If u and v are in different bicliques of K*, then we sample Wuv 
according to the probability distribution ^2, with mean P < a. 

• The probability distributions fli, Q2 are chosen such that u G U,v E V , < Wuv < 1- 
We say that such W are sampled from the planted bicluster model. Note that G* defines a 



feasible solution for (3.5) by 



i=l 



where Uj, Vj are the characteristic vectors of Ui and Vi, respectively, for all i = 1, . . . ,k. 



Note that Z* has objective value equal to (3.2). The following theorem describes which 



partitions {Ui, . . . ,Uk} and {Vi, . . . , V^} of [/ and V yield random matrices W drawn from 
the planted bicluster model such that Z* is the unique optimal solution of the semidefinite 



relaxation (3.5) and G* is the unique densest fc-disjoint-biclique subgraph. 



Theorem 3.1 Suppose that the vertex sets (f/i, Vi), . . . , {Uk, Vk) define a k-disjoint-biclique 
subgraph K* of the complete bipartite graph Km,n = {{U, V), E). Let Uk+i := U\ (uf^^^t/j) 
and Vk+i := \ (U^^^^V^). Let = \Ui\ and Ui = \Vi\ for all i = 1,...,A; + 1 and 
n := minj=i^ ..^fc nj. Let Z* be the feasible solution for (3.5) corresponding to K* given 
by (3.6). Let W G H^^^ be a random matrix sampled from the planted bicluster model 



according to distributions Qi and Q2 with means a, (3 satisfying 

a > /3(2(1 - 5o,mfe+i5o,nfe+i) + 5o,mfe+i5o,nfe+i). (3.7) 

Suppose that there exist scalars {ri, . . . , Tk+i} such that rrii = T^rii for alii G {1, . . . , + 1} 
and 

aTi > f3Tj (3.8) 

for all i,j G {1, . . . , A; + 1}. Then there exist scalars Ci, C2, C3, C4 > depending only on 
a, /3, and {ri, . . . , Tk+i} such that if 



and 



ni<ci{a-pyh^ (3.9) 

k \ 1/2 

02 [k^ni\ + €3(1 + y/nk+i)\^ + (3Tk+ink+i < 04(0 - (3)h (3.10) 
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then Z* is the unique optimal solution of (3.5) and G* is the unique maximum density 
k-disjoint-biclique subgraph with respect to W with probability tending exponentially to 1 
as n tends to oo. 



For example, Theorem 3.1 implies that 0{N^^^) bicliques of size rh = n = N"^/^ can 
be recovered from a graph sampled from the planted bicluster model with up to 0{N^^^) 



diversionary nodes by solving (3.5). 



Proof of the guarantee for recovery of the maximum 
density /c-disjoint-bichque subgraph 



This section comprises a proof of Theorem 3.1 The proof of Theorem 2.1 is essentially 



identical to that of Theorem 3.1, although with slight modifications made to accomodate 



the different relaxation and exploit symmetry of the weight matrix W. A proof of Theo- 
rem OA can be found in Appendix |Xj 



4.1 Optimality Conditions 

In this section, we provide conditions for optimality of the proposed optimal solution Z* 



of the semidefinite relaxation of the densest fc-disjoint-biclique problem given by (3.5). We 



begin with the following sufficient condition for the optimality of a feasible solution of 



(3.5) 



Theorem 4.1 Let Z be feasible for (3.5) and suppose that there exist some /ii,/i2 > 0, 
G R^, G Rf +^)x(*^+^) and S G +^ such that 



X G R^:^ 



fill + Xe^ + eX^ -W 

-W^ fi2l + + e0^ 

X^{Zu,ue - e) 

(l)^iZv,v - e) 
Tr {Zrj) 
Tr (ZS) 



rj = S 




0. 



(4.1) 

(4.2) 
(4.3) 
(4.4) 
(4.5) 



Then Z is optimal for (3.5). 



Note that 



{k/M)I 




{k/N)I 



is a strictly feasible solution of (3.5), and choosing A = 0, r7 = 0, = and /ii,/i2 
large enough that the left-hand side of (4.1) is positive definite shows that the dual of 
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(3.5) is strictly feasible. Thus, Slater's constraint qualification holds for (3.5) and its 



dual. Therefore, a feasible solution Z is optimal for (3.5) if and only if it satisfies the 
Karush-Kuhn- Tucker conditions. Theorem provides the necessary specialization to 
(3.5) of these necessary and sufficient conditions (see, for example, [6l Section 5.5.3] or [29| 



Theorem 28.3]). 

The proof of Theorem 3.1 uses techniques similar to those used in [3]. Specifically, 
the proof of Theorem 3J^ relies on constructing multipliers satisfying Theorem |4.1 The 
multipliers A, 0, and 77 will be constructed in blocks inherited from the block structure of 
the proposed solution Z*. Once the multipliers fii, fi2, X, (/), and r] are chosen, condition 



(4.1) provides an explicit formula for the multiplier S. 



The dual variables must be chosen so that the complementary slackness condition (4.5) 
is satisfied. The condition Tr {Z*S) = is satisfied if and only if Z*S = since both Z* 
and S are desired to be positive semidefinite (see |3T1 Proposition 1.19]). Therefore, the 



multipliers must be chosen so that the left-hand side of (4.1) is orthogonal to the columns 
of Z*. That is, we must choose the multipliers /ii,/X2,A,0, and rj such that S, as defined 
by (4.1), has nuUspace containing the columns of Z*. By the special block structure of Z*, 



this is equivalent to requiring 



SiUg U Vg, Us U Vs[ 



(4.6) 



for all g, s G {1, . . . , k}, where (f/i, Vi),. . . , {Uk, Vk) are the planted bicliques corresponding 



to the proposed solution Z*. The gradient equation (4.1 ) and (4.6 ) provide explicit formulas 
for the multipliers A and 0. Moreover, the complementary slackness condition (4.4) implies 
that all diagonal blocks ri{Uq U Vq, Uq U Vq), q = 1, . . . ,k are equal to 0. To construct the 
remaining multipliers, we parametrize the remaining blocks of S using the vectors y'''* and 
z''''^ for all g 7^ s. These vectors are chosen to be the solutions of the system of linear 
equations defined by SZ* = Z* S = 0. We will show that this system is a perturbation of 
a linear system with known solution and will use this known solution to obtain estimates 
of y'^'* and z'^'^. 

Once the multipliers are chosen, we must establish dual feasibility to prove that Z* is 
optimal for (3.5). In particular, we must show that A, 0, and rj are nonnegative and S is 
positive semidefinite. To establish nonnegativity of A, 0, and 77, we will show that A, 0, and 
7] are strictly positive in expectation and close to this positive mean with extremely high 
probability. To establish that 5* is positive semidefinite, we will show that the diagonal 
blocks of S dominate the off diagonal blocks with high probability. 

Let (f/i, Vi), . . . , {Uk, Vk) denote the vertex sets of the fc-disjoint-biclique subgraph G* 
of the bipartite complete graph Km,n = {.iU,V),E) with vertex sets U and V of size M 
and N respectively. Let Uk+i ■= U \ (U^if/,) and 14+1 ■= V \ (U^iV^i)- Let W e R^^^ 
be a random nonnegative matrix sampled from the planted bicluster model according to 
distributions Qi, Q2 with means a, p. Let rrii := \Ui\, Ui := \Vi\ for alH = 1, . . . , A; + 1, and 
let rh = minj=i^ ..^fc mj, n = minj=i^ nj. Let Cj := UiUVi and let := |Cj| = rrii + rii 
for all i = 1, . . . , A; + 1. We assume that is equal to a scalar multiple of for all 
i E {1, . . . , k + 1}. That is, nii = rfni for some Tj > for alH = 1, . . . , /c + 1. 
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We next by provide necessary background regarding the norms of random matrices. 

4.2 Bounds on the norms of random matrices and sums of ran- 
dom variables 

We first recall a theorem of Geman providing a bound on the spectral norm of a 
random matrix with independent identically distributed (i.i.d.) entries of mean 0. 

Theorem 4.2 Let A be a lyn] x n random matrix with independent identically distributed 
(i.i.d.) entries sampled from a distribution with mean fi and variance such that Aij G 
[0, 1] for all i & {1, ... , \yn] }, j G {1, . . . , n} for fixed y G R+. Then, with probability at 
least 1 — Ci exp(— 02^."^^) where Ci > 0, C2 > 0, and C3 > depend on a and y, 

\\A — /uee'^ll < c^o\fn 

for some C4 > depending on y. 

Note that this theorem is not stated in this form in [15j, but can be deduced from the 
equations on pp. 255-256 by taking k = n'^ for a q satisfying (2a + 4)g < 1. 

A similar theorem due to Fiiredi and Komlos [H] is available for symmetric matrices. 

Theorem 4.3 Let A E TP be a random symmetric matrix with independent identically 
distributed (i.i.d.) entries sampled from a distribution with mean fi and variance cr^ such 
that Aij G [0, 1] for all i,j G {1, . . . ,n}. Then 

\\A — fiee^W < ScTi/n 

with probability at least 1 — exp(—cn^^^) where c depends only on a. 

As in the case of Geman's paper [15], this theorem is not stated exactly this way in 
but can be deduced by taking k = a^^^n^^^ and v = ay/n in the inequality 

P(max |A| > 2(7 i/n + v) < i/nexp(— A;f /(2cri/n + v)) 

on p. 237 of [H]. 

We next provide a theorem of Hoeffding (see |18l Theorem 1]), which provides a bound 
on the tail distribution of a sum of bounded, independent random variables. 

Theorem 4.4 (Hoeffding's Inequality) Let Xi, . . . ,Xm be independent identically dis- 
tributed (i.i.d.) variables sampled from a distribution satisfying < Xj < 1 for all 
i = l,...,m. Let S = Xi + ■ ■ ■ + Xm- Then 

Pr(|5-E[5]| >t) < 2exp f— ^ (4.7) 

\ m J 

for all t>0. 
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4.3 Choice of the multipUers and a sufficient condition for unique- 
ness and optimality 



The matrix S and, hence, A, 0, and rj will be constructed in blocks indexed by the vertex 
sets Ui, . . . ,Uk+i and Vi, . . . ,Vk+i. Note that the diagonal blocks of Zlj^ indexed by 
Ui, . . . ,Uk consist of multiples of the all ones matrix ee-^ and the remaining blocks are 
equal to 0. Therefore, Xuk+i = by (4.2). Similarly, the block structure of Z* implies that 

by and r]c„,c. 



'Vk+i = u uy ^^.o[ j anu ijCqfi^ = for all q 
choose A[/g so that Sjj^^Cq is orthogonal to Z 
that 



k by (4.4). For each q 



, . . . ,k, we 
In particular, it suffices to choose A such 



= Suq,Uge + TqSuq,Vqe = fiis + m^Xu^ + (A^ e)e - TqWu^y^e 



(4.8) 



for all q 



k. Rearranging (4.8) shows that Xu„ is the solution to the system 



{uiql + ee^)Xuq = TqWu^y^e - /iie 



(4.9) 



for all g = 1, . . . , fc. To obtain an explicit formula for A, we will use the Sherman-Morrision- 
Woodbury formula (see, for example, [HI Equation (2.1.4)]), stated in the following lemma. 



Lemma 4.1 Let A E R"^" be nonsingular and U,V E R,"'X'= be such that I + V'^A is 
nonsingular. Then 

{A + UV^)-^ = A-^ - A-^U{I + A-^U)-^V^ A-\ (4.10) 

Moreover, we have 

in the special case that k = 1 and V'^A^^U 7^ — 1 . 



For each g G {1, . . . , k}, applying (4.11 ) with A = niql, U = V = e shows that choosing 



TqWu^y^e 



1^1 + 



TgUq 



(4.12) 



ensures that the rows of Su„,c„ are orthogonal to the columns oi Z}j ^ . Similarly, choosing 



W, 



TqUq 



(4.13) 



forces the rows of Sy^^Cq to be orthogonal to the columns of Z^ ^ for all g G {1, . . . , k}. 
Note that 

^[^Uq\ = T^iOiTqUq - Hi) 



2m„ 



2 \Tq rriq 



(4.14) 
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for all g G {1, . . . , k} We choose /xi = e(a — (3)m for some scalar e > to be defined later 
to ensure that A is nonnegative in expectation. Similarly, 

for all q = 1, . . . ,k. To ensure that (p is nonnegative in expectation, we choose fi2 = 
e(a — f3)h. 

We next construct the multiplier t]. We set ?7Cfc+i,Cfe+i = and parametrize rjc^ Cs using 
the vectors y'^''^ and z'^'* for each q s. Specifically, we choose 

Vc,,c.=Il'^' + y'''e^ + e{z^'Y (4.16) 

where 



for some scalars 7iUq,Usy'^Uq,Vsy'^Vq,Usy T^Vq^Vs > to be defined later. The vectors y^'"* and tA'^ 
are chosen to be the solutions to the systems of linear equations imposed by the requirement 
that SZ* = 0. Specifically, we choose y'^'* and z^'"* to be solutions of the system of equations 
given by 5'c„c.^5.,c, = and Sc,,CqZ^^,Cs = 0- symmetry of S and Z*, y'^'' = z'*''? 

for all q ^ s. As in [3J, we show that this system of linear equations is a perturbation of a 
linear solution with known solution. Using the solution of the perturbed system we obtain 
bounds on y''''* and z'^''*, which are in turn used to establish that t] is nonnegative and S is 
positive semidefinte. 

For all g, s G {1, . . . , k + 1} such that q ^ s, let 



and let b = b'^''* G R'^<'.Cs be the vector defined by 

be, = {Scq,cs - E[Scq,c.]) (^^^) , be. = [Sccq ' E[Sc^,cJ) . (4.18 
The parameters 7rug,Us^'^Ug,Vs^'^Vq,Us^'^Vq,Vs > will be chosen so that 

{E[Scq,c.] - n-) (^"J = 0, {E[Sc.,Cq] - n-) (^"J = 0. 

We will establish that such a choice of 11'^''' exists in Lemma 14.31 



(4.19) 



Fix q,s G {1, . . . ,k} such that q ^ s. The requirement that the rows of Scq^c^ 
orthogonal to the columns of Z% ^ is equivalent to y = y"^'* and z = z'^''^ satisfying 



rris + r^n,)y + e(z^e + r^z^e) = b^,, (4.20) 
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where z^/ and zy are the entries of z indexed by Ug and Vg respectively. Similarly, the 
requirement that the columns of Scq,Cs are orthogonal to the rows of Z^^^Cs equivalent 
to y and z being solutions of the system of equations 



(mg + Tqnq)7. + (y^e + Tqyye)e = h 



Cs 



(4.21) 



if (4.19) holds, where yu and yy are the entries of y indexed by Ug and Vg respectively. 



Combining (4.20) and (4.21) shows that y and z must be chosen to be solutions of the 
system 

m,(l + l/r,)/ e(e; r,e)'^ \ fy^ 
(e; Tqe)e^ mg{l + 1/r,)/ 



b. 



(4.22) 



The system of equations in (4.22) is singular, with nullspace spanned by the vector (e; — e). 



It follows that (y + 76; z — 76) is a solution of (4.22) for any scalar 7 if (y; z) is a solution 

(4.23) 



of (4.22). In particular, there exists solution (y; z) of (4.22) such that 

0. 

We choose (y; z) to be a solution of the perturbed system 



T T 

e y — e z 



ms{l + 1/ts)I + 9ee^ e(e; r^e)^ - 
(e; Tge)e^ — 6ee^ mg{l + \/Tq)I + 6ee 



T 



b. 



(4.24) 



Each row of the system of equations (4.24) is equivalent to that of the system (4.22) with 
an additional term of the form 6{e^y — e^z). Therefore, the solution of (4.22) satisfying 



(4.23) is the unique solution to (4.24) for any 9 > such that (4.24) is nonsingular. In 



particular, (4.24) is nonsingular when 9 = 1. In this case, y and z are the unique solutions 
of the system 



ms{l + l/Ts)I + ee^ e(0; (r, - l)e)^ 
(0; (Tg - l)e)e^ mg(l + 1/r,)/ + ee^ 



b. 



(4.25) 



To obtain explicit formulas for y and z, we apply (4.10) with 
m,(l + l/r,)/ + ee^ 



A 



U 











{rg - l)e 







mg{l + l/Tg)I + ee^ ' ' 



(r, - l)e ^ / 







Let ojg := mq{l + l/r^), 0;^ := ms(l + 1/ts). Applying (4.11), with U = V 
and A = cu,/, shows that 



e, A = Ugl 



A-' 



(lM)(/-ee7(u;, + rg)) 

{l/uq){I -ee^/{uq + r,)) 



(4.26) 
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Multiplying (4.26) on the left by V"^ and on the right by U yields 



1 {Ts - l)nq/{uJs + Tq) 

{Tq-l)ns/iujq + rs) 1 



Let 



D :-- 



{Tq - 1){ts - l)ngns 



{u, + rq){uq + rs) 

It is easy to see that |Z}| < 1 for all choices of Tq, Tg > 0, and Ug, Ug. It follows that 

det(/ + V^A-^U) = 1-D^0. 
Therefore, / + V'^A~^U is nonsingular, with 



(4.27) 



(4.28) 



(I + V^A-^U)-^ 



1 



1 - D 



1 



{Tq - l)ns/{Uq + Tg 



~{Ts - l)nq/{Us + Tq) 
1 



(4.29) 



Substituting ( |4.26[ ) and ( |4.29[ ) into ([4.10|) shows that 

1 



{A + UV^)-^ = A-^ 



guee^ gi2&e^ gis^e^ gi4.ee^ 



1- D \ g2iee^ g22ee^ g23ee^ 5'24ee^ ' ' 



(4.30) 



where 



D 



gn ■-- 



913 
921 



(ts - l)ns 



Uq{uJq+rg){uJs+rq)' 

UJsiuJs + rq){ujq + rs) 
D 



923 



912 

9ii ■ 
, 5-22 : 

92A 



D f UJs + TTl, 



UsUq \ UJs+rq 

{ts - l)(a;g + m,) 

Uq{uJq+rs){u}s+rq) 
{Tq - l){Us + mq) 

UJs{uJs + rq){ujq + r,y 
D fuoq + nis 



(4.31) 



UJqHs \UJq + rs 



UJq{ujq + r^)' 

the blocks of columns of the second matrix in the right-hand side of (4.30) have widths 
niq, Uq, rris, and n^, respectively, and the blocks of rows of this matrix have size and r^. 
It follows that 



1 

Us 
1 



1 



Uq Uq{Uq + rsy 1-D 



iOs{Us+rq) 



1-D 
1 



(^iib^,e + guhl^e + g^hj^e + guhle)e (4.32) 
(^2ib^ e + ^22by e + ^23b^^e + c/24by^e)e. (4.33) 



For q G {1, . . . , k}, we set z^^^''^ = and choose y = y^+^-'s so that the columns of 
S{Ck+i, Cq) are orthogonal to (e; r^e). By our choice of 11^"'"^''^, y must satisfy 



ye 



T 



TqB 



e{\u-E[Xu]f 



-Wi 



+ /3ee^ 



]^k+l,q 



TqBy 
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Therefore, we choose 



k+l,q 



1 



1 

rriq + TqUg 



Uk+1 



(4.34) 
(4.35) 



We choose the remaining blocks of t] symmetrically. That is, we choose y'?''=+i = and set 



^g,k+i ^ yfc+1,5 all g = 1, . . . , fc. 



In summary, we choose the multipliers /ii,/i2 G R, A G R*''^, G R^, rj G R,Af+AfxM+Ar 
as follows: 



/ii = e(a — f3)rh 
ji2 = e(« — f3)h 



\ 0, q = k + l 



Vc„,Cs 



2 H — 



0, 

Yi<i,s ^ yg,«gT ^ e(z'?'^)^, if g ^ s 
0, otherwise, 



g = l,...,/c 

q = k + l 



(4.36) 
(4.37) 

(4.38) 

(4.39) 
(4.40) 



where e > is a scalar to be defined later, H'^'* is chosen so that (4.19) is satisfied and y*^'^ 
and tA'^ are given by (4.32), (4.33), (4.34), and (4.35) for all s. We choose S according 
to (4.1). To establish that 5* is positive semidefinite with high probability, we decompose 
S as the sum S = Si + S2 + + where 

5, 



Sl{Cq, Cs) 

S2{Cq, Cs) 
SsiCq, Cs) 



... - E[Sc 

fc+liL-fc+l ' 



0, 



yQ,Sf,T _ e(z«'")^, if g ^ s 

ifg = s = A; + l 
otherwise, 



0, 



Sr 



if g = s,g G {1, 
otherwise, 

E[Scg,Cg], ifg = s 

otherwise. 



k}, 



and 



^4 



/ill 

/i2/ 



(4.41) 

(4.42) 
(4.43) 

(4.44) 



We conclude with the following theorem, which provides a sufficient condition for op- 
timahty and uniqueness of the proposed solution Z* for (3.5). 
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Theorem 4.5 Suppose that the bicliques {Ui, Vi), {U2, V2), ■ ■ ■ , {Uk, Vk) form a k-disjoint- 
biclique subgraph G* of the bipartite complete graph Km,n = {{U,V), E) . Let rrii := \Ui\ 
and Hi := \Vi\ for alii = l,...,k. Let Uk+i ■= U \ {U^=iUi) and Vk+i := V \ {U^=iVi). 
Let W G R^^>^^ be a random matrix sampled from the planted bicluster model according 
to distributions ^1,^2 with means a, (3 satisfying ( |3.7 ). Suppose that rrii = r^Ui for all 
i E {1, . . . , A; + 1} such that the scalars {ti, . . . ,Tfc_|_i} satisfy (3.8) for all i,j G {1, . . . ,k + 
1}. Let Z* be the feasible solution for (3.5) corresponding to G* defined by (3.6). Let 



/ii, /i2, A, 0, be chosen according to (4.36), (4.37), (4.38), (4.39), and (4.40) such thatX,(f), 
and 7] are nonnegative. Let S be chosen according to (4.1) and decompose S as S = Ylt=i 
according to (4.41), (4.42), (4.43), and (4.44). Then there exist scalars ^1,^2,^3 > such 
that if 

<^iia-pyn^ (4.45) 



for all i 



k, and 



\Si\\ + Urik+iN)'/^ < - P)n 



(4.46) 



then Z* is optimal for (3.5) and G* is the densest k-disjoint-biclique subgraph of K 



M,N 



corresponding to W with probability tending exponentially to 1 as n 00. Moreover, if 



rise Wu„v^e > UgS Wu„v,e 



(4.47) 



for all q,s G {1, . . . , A;} such that q ^ s, then Z* is the unique optimal solution of ( |3.5 ) 
and G* is the unique densest k-disjoint-biclique subgraph of Km,n with probability tending 
exponentially to 1 as n —)■ 00. 



The remainder of this section consists of a proof of Theorem |4.5[ By construction, 
fi,X,(f),ri and S satisfy (4.1), (4.2), (4.3), (4.4), and (4.5). Moreover, /i, A,0, rj are nonneg- 



m 



ative by assumption. Therefore, it suffices to show that S is positive semidefinite if (4.45) 
and (4.46) are satisfied. To do so, we will establish that x-^Sx > for all x G R^+^^' 
this case. Fix x G R*^+^. We decompose x as x = Yli=i ^^i^i + ^k+i for some ipi, 
where 

1, iijeU, 
Ti if j G Vi 
otherwise, 



and Xfc+i is orthogonal to spanjxi, . . . ,Xfc}. Since Xj is a scalar multiple of a column of 



Z* for aU i 



k, span {xi, . . . , Xfc} C Null S. It follows that 



c^5x 



-I'S'xfc+i — 



Si^k^ 



1- 



(4.48) 



1=1 



Recall that 



54 



/ill 

fl2l 
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Therefore, 

Xfc+iS'4Xfe+i > mm{/ii,/i2}||xfc+if = e(a - /3) min{m, n}||xfc+ip. (4.49) 

We next establish similar bounds on x^_^^S2^k+i and xl'^^^S'sx^+i. We begin with the 
following lemma, which provides the necessary lower bound on x^^^^ 6*3x^+1. 

Lemma 4.2 There exists scalar c > such that 

xl^^Ss^ik+i > -c max y/n^Wxk+iW^ (4.50) 

i=l,...,k 

with probability tending exponentially to 1 as n ^ 00. 

Proof: Recall that Ss{C„ C,) = Sc„c,-E[Sc„c,] for all g G {1, ... , k} and S^iC,, C,) = 
if q^s or q = s = k + l. We have 



k 

Xfc+^S'sXfe+i > - ||S'3(Cg,Cg)||||xfc+i(C^)f > - max \\S3{Cg,Cg)\\\\xk+ 

^ q=l,--,k 
9=1 

Therefore, it suffices to show that ||S'3(Cq, Cq)\\ = 0{y/n^) for all g = 1, . . . , A;. 
Recall that 

(Ac;, - E[XuJ)e^ + e{Xu^ - E[XuJf -Wu„v, + aee 

Ug,Vq 



l\2 



Applying Theorem 4.2 with A = Wn,v„ — aee^ shows that there exists c depending only 



on Tq and the variance of the entries of W such that 



\\Wugyg - aee^ || < (4.51) 
with probability tending exponentially to 1 as n — )■ 00. Therefore, 

||^3(C^,,C^,)|| <£v^ + 2max|y^||Ac;,-E[A^J||,y^||0y, -E[0v^J||} (4.52) 

with probability tending exponentially to 1 as n 00, by the triangle inequality. 

It remains to show that ||Ac7, — £'[Ac/g]|| and \\(t>Vg — E[(f)Vq]\\ are bounded above by a 
scalar with high probability. Recall that 

Xug - E[Xug] = — [TqiWugyge - auqe) - -^{e^Wugy^e - am^nje ) . (4.53) 

Ulq y ZTqHq J 



Note that (4.51) implies that 

||W^c/„,y„ - a?^ge|| < \Wu y - aQi^\Jnq < cug (4.54) 
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with probability tending exponentially to 1 as n — )■ oo. On the other hand, applying 
Hoeffding's inequality (4.7) with S = e^Wugy^e and t = mq^ffTq shows that 



|e Wu,y,e - am„n„\ < m„Jn, 



(4.55) 



with probability exponentially close to 1 in h. It follows that there exists scalar ci > 
such that 

\\\u^-E[\u;\\\ <ci (4.56) 

with probability tending exponentially to 1 as n — i- oo. Similarly, there exists scalar C2 > 
such that 

Uv,-E[^vM<C2 (4.57) 



with probability tending exponent aially to 1 as n tends to oo. Substituting (4.56) and 



(4.57) into (4.52) shows that there exists scalar c, depending only on {ri, . . . ,Tfc+i} such 
that 

\\~Sz{C,,C,)\\<c^, 

for all g = 1, . . . , with probability tending exponentially to 1 as n — )■ oo. ■ 
The following lemma provides a similar lower bound on x^^^ 52X^+1. 



Lemma 4.3 Suppose that a, /5, ri, . . . , r^+i satisfy (3.7) and (3.8). Then, for all q,s E 



{1,...,A;} such that q s, there exist scalars 7TUg,Us^'^Uq,Vs}'^Vg,Us}'^Vq,Vs > c > 0, 
depending only on a, P,Ti, . . . , r^+i such that 



x^+iS'aXfc+i > -c||xfc+if A/rfc+iA^ 



(4.58) 



and 



{EiScM - n'i ( ) = 0. [Eiscc,] - n-) ( ) 



(4.59) 



Proof: Fix g, s G {1, . . . , fc} such that q s. Let tti : = 



and 7r4 := TTy^^y^. The system of equations defined by (4.59) is equivalent to 



( 1 1/^. 



1 



Ta 



\ 

Ts 1 

1/r, 

1/ 



/A - (3/ts \ 
A - f3/T, 



\ 



where 
A : 



a 
2 



1 1 

— + - 



2 {nir, 



1 



a /i2 



1 



q,Us ; 



(4.60) 



(4.61) 
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The system (4.60) is singular with solutions 





-K4 Hi / 




TTi = 








TqTs 2 \ 




1^2 = 


{(f)- TTi) /Ty 




TTs = 


(0 - vr4)/r. 


-/3 



+ 



+ ^(:^ + :^) («2) 

(4.63) 
(4.64) 



1 1 

+ 



We next show that there exists some choice of tt^ > 0, independent of n, such that (4.58) 
holds and tti, 712, 713 are bounded below by a positive scalar whenever (3.8) holds. 



Suppose that a, /3, ri, . . . ,Tk+i satisfy (3.8). Let tt^ := pi0 — p2(3 for some pi,p2 > 0. 
For 7r4 to be strictly positive, we need 



P2I3 < Pl0. 



(4.65) 



Substituting our choice of 714, into the formulae for 712 and vra given by (4.63) and (4.64) 
and rearranging shows that pi and p2 must satisfy 



P2/3 > /3max{rg, rj + (pi 



(4.66) 



for 7r2,7r3 to be positive. When (3.8) is satisfied 



f3max{Tg, Ts} > (^(^g + ^s) - f3max{Tg, rjj - e(« - /3) > 



for sufficiently small e > in (4.37). Choose p2 such that 

P2 = Pi0 - /«(/3max{rg,rJ - 0} 

for some k G (0, 1). Then = fi;(0 — /3max{rq, Tg}) is bounded below by a positive scalar 
depending only on a,(3,Tq, and by our choice of p2- Since our choice of pi,p2 satisfies 
(4.66), 7r2,7r3 are also bounded below by a positive scalar. Finally, since is at least a 
positive scalar, we can always take e > in (4.36) and (4.37) small enough that tti is also 
bounded below by a positive scalar depending only on a, (3, and r^. 



For every q G {1, . . . , k}, t\\ 
chosen so that 



TT 



7r2 



TT. 



o , anu. /1 4 — Ha 



are 



(2^("^'?^9-/^i) - 
-(/3 + 7r3)ee^ 



TTi ee 



1 



9 "9 



-(/3 + 7r2)ee^ 



0. 



It follows that TTi, 7r2, TTa, 714 are the solutions of the system 



1 





vr2 

V^4/ 



A - /3/r, 

- 



(4.67) 



22 



where A and are chosen as in (4.61). Since a > 2/3 by (3.7), 

1 /a 



i (a. 

71"! + 7r2/rg = A- /3 = — 



2mn 



is strictly positive for sufficiently small choice of e > in (4.36), and there is some choice 
of TTi and so that both are strictly positive. In particular, choosing 

vri = ^(A-/3/r,), vr^ = |(A - /3/r,) 
yields such a pair. Similarly, 



2r„ 



satisfy (4.67) and are strictly positive for sufficiently small e > in (4.37). 



It remains to show that this particular choice of 11 satisfies (4.58). Let := Xfc+i(f/g) 
and := Xfc+i(V^) denote the entries of x^+i indexed by f/g and Vq respectively, for all 
g = 1, . . . , /c + 1. For all g = 1, . . . , /c, we have 



T T 

e = -TgV^ e 



since x^+i is orthogonal to span {xi, . . . , x^}. Fix s G By our choice of 



Tr;^ , TTg , TTg , and we have 

'S'2(Cfc+i, Cs) = S'2(Cs,Cfc+i) 



Afc+i,se 
Wi,s/^s)e, 



e/r, 



■5 1 / T T\ 



Since (e; r^e) is orthogonal to Xjt+i(Cs) for all s G {1, . . . , A;}, we have 



y~^Xfc_(_i(Cfc+i)"^5'2(Cfc+i, Cs)xfc_|_i 



s=l 



2r, 



e r,e 



Xfe+i(Cfc+i)) ((e 



)xfc+i(Cs)) 



C 



||xfc+i(Cfc+i)||i(||xfe+i||i - ||xfc+i(Cfc+i)||i), 
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where Tmin — mmj=i_..._fe r^, r^ax — maxj=i,...,fc r^, and 

^min 

The optimization problem 

max {||wi||i||w2||i : ||wi||^ + ||w2||^ = 

wieR^i,w2eR'^2 

has optimal solution w* = / \/W[)e, vf^ = {^/\/2i^)e with optimal objective value 
equal to ^'^\/44/2. Taking Wi := Xfe+i(Cfe+i) and W2 = (xfc+i(Ci); . . . ;Xfe+i(Cfc)) and 
^ — ||xfe_|_i||, shows that 

||x,+i(C,+i)lli(l|x.+i||i - ||x,+i(C,+i)ili) < ^^^Vr.+i(iV-r,+0 
and, consequently, 

J2^k+i{Ck+ifS2{Ck+i,Cs)^k+i{Cs) > aAWV. (4.68) 

Similarly, 

Xfe+l(Cg)'^S'2(Cg, Cg)Xfc+l(Cg) 

= (v^e)^ (4r,„ - (4.69) 
for all g = 1, . . . , A;. For s e {1, . . . , /c} such that q s, we have 

Xfc+l(Cq)'^5'2(C5, Cs)Xfc+i(Cs) 

l-(^ + vrr)ee^ (0^'^ - J 
= (v^e)(vf e) (r,r,(A^'^ - Tr^) + /3(r, + r,) + r.vrr + r^TTg^'^' + - (4.70) 
-4(vje)(vfe)(0«'^-7rr). (4.71) 
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Here (|4.71l) is obtained by substituting (14.621), (14. 631), and (14. 641) into (14.701) . Let t;„ := vje 



for all g = 1, . . . , fc. Combining (4.68), (4. 69) and (4.71) shows that 



9=1 ^ 



)k k 
(/=1 s=n+l 



n„ 



k k 



> -c\\^k+i\\Wn+iN + 

q=l s=q+l 



VqVsl "^min 



q=l s=q+ 

An 



since Y!1=i > Zlg^s l^g^sl, where Tq^ := max{rg, rj. If armin > /^^i for alH = 1, . . . , /c 
then, for all e > sufficiently small and n sufficiently close to 1, we have 



ttT-min - - (1 - - fi:/3max{rg, rj 

4n 



k(5 max{r^, Tg} 



— 3] { Tfl \ 

> a^min - P max{rg, rj - (1 - k) (a - /3) max{rg, rj ( ^ + ^ ) - 

for all q ^ s. It follows immediately that :x.]^^^S2^k+i > — c||xfc+i W^y/r^+iN. 



Substituting (4.49), (4.50), and (4.58) into (4.48) shows that 



Xfc+iS'xfc+i > min{/ii,/i2} - 7 max y/ui - c^J r^+iN - \\Si\\ ||xfe+i|| . (4.72) 

\ i=l,...,k J 

Since /ii, /i2 are both a scalar multiple of n, where the scalar depends only on a, /3, ri, . . . , r^+i, 
there exist scalars ^i, ^2 > also depending only a, /3, ri, . . . , r^+i such that the right-hand 



side of (4.72) is nonnegative if \S\\ + CA/rvTiiV < ^2{p- — 1^)'^ and rii < ^i{a — (3) fi for 



alH = 1 k. 



To see that Z* is the unique optimal solution if (4.47) holds, suppose, on the contrary, 



that Z is also optimal for (3.5). The columns of Z lie in Null 5* since SZ = 0. Since 
x-^Sx = if and only if x^+i = 0, the nuUspace of 5* is spanned by the columns of Z*. 
Thus, we may write Z as 

k k 

i=i j=i 



(Jlj'X.i'X.j 



for some a E R^^'^. The fact that the row sums of the (V, V) block of Z are at most one 
implies that 



''"<; '^q^qq ~l~ 



(4.73) 



s=l 
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for all q = l,...,k. Since Z* and Z have the same objective value, there exists some 
g G {1, . . . ,k} such that 



e^Wu.ye 



s=l 



(4.74) 



Combining (4.73) and (4.74) shows that 



< e^Wny.e 



k 

— -E 



, '^g^'J ^q Tq^q 

\ s^q 



+ ^TsagsG^Wu^y^e 

I s=l 
/ s^q 



1 

- ^ TSagsinqe^Wu^ye - risB^Wu^y^e, 



1 s=l 
s+q 



contradicting (4.47). Therefore, Z* is the unique optimal solution of (3.5) if (4.47) is 
satisfied. 



4.4 Nonnegativity of the dual variables 

Let (f/i,Vi), (f/fc, Vfc) denote the vertex sets of afc-disjoint-bichque subgraph of the 
bipartite complete graph Km,n = {{U, V), E) with vertex sets U and V of size M and 
respectively. Let W G R^^^ be a random nonnegative matrix sampled from the planted 
bicluster model according to distributions ^2 with means a, (3. Let Uk+i = t/\(uf=if/i), 

Cj, Tj, m, and n, be defined as in Section 4.1 for all 

7] be 



,k+l. Suppose that a, /3, ri, . . . , r^+i satisfy (3.7) and (3.8). Let /^i, /X2, A 



i = 1 

chosen according to (4.36), (4.37), (4.38), (4.39), and (4.40). In this section, we establish 
that the entries of A, 0, and rj are nonnegative with extremely high probability. 

We first establish that the multipliers A and are nonnegative with high probability. 
The following lemma provides the necessary lower bound on the entries of A and (p. 

Lemma 4.4 There exist scalars ci, C2 > depending only on a, P , ti, . . . , Tk such that 

Xi > (ci - C2n-i/^), 4>j > (ci - 02^-^/^) (4.75) 



for all i E U \ Uk+i, j E V \ Vk+i with probability tending exponentially to 1 as h ^ oo. 



Proof: Fix g G {1, . . . ,k} and i G Ug. Recall that 



1 / e^Wu.ye 



TqUq 
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Applying Hoeff ding's inequality (4.7) with S = ^^gy^ Wij and t = n'q'^ shows that 



%2 OiTlq 



(4.76) 



with probability at least 1 — 2 exp(— 2r?,^/^) for all i G Uq. Moreover, applying (4.7) with 
S = e^Wu„y„G and t = mq.Jn^ yields 



G^Wu^y^e < auiqUq + mq^/n^ 



(4.77) 



with probability at least 1 — 2exp(— 2m). Combining ( 4.76[ ) and (4.77) shows that there 
exist scalars Ci, C2 > depending only on a, (3, and {ti, . . . ,Tk} such that 



A,; > 



1 ^l 



/ii) - 2rgn3/4^ > ci 



C2n 



-1/4 



with probability at least 1 — 2exp(— 2m) — 2 exp(— 2ri^/^). A similar argument shows that 
Ci and C2 may be chosen so that 

4>j > Ci — C2n^^^^ 

with probability at least 1 — 2exp(— 2m) — 2 exp(— 2ri^/^) for each j E Vq. Applying the 
union bound over all i E U , j E V completes the proof. ■ 

We now derive lower bounds on the entries of rj. Recall that r]{Cq, Cg) = H'^'^ — y^'*e^ — 
e(z'^''')^ for all q ^ s, where the entries of the matrix 11'''* are bounded below by a positive 
scalar with probability tending exponentially to 1 as n — )■ oo. Therefore, to prove that rj 
is nonnegative with high probability, it suffices to shows that the entries of y^'** and z'^''' 
tend to zero with high probability as h approaches oo. The following lemma provides the 
necessary upper bound on ||y''''^||oo and ||z'''''||oo- 



Lemma 4.5 There exists scalar c > such that 



+ z 



q,s\\ ^ 

oo 



C 



• r 1/4 1/4t 

mm|?T,g , Us I 



(4.78) 



for all q,s E {l,...,k} such that q s with probability tending exponentially to 1 as 
n — 7- oo. 



Proof: Fix q, s E {1, . . . ,k} such that q ^ s. The proof for the case when g or s is equal 
to + 1 follows by a similar argument. Without loss of generality, we may assume that 
riq < Ug. We first obtain an upper bound on ||y||oo = ||y'^'''||oo- By the triangle inequality, 
we have 



1 „- „ |bc el 1 



9 



Us " ujsi^s^r^ l-D 



\9ii I \huA + I^i2| |by e| + l^isl |b^^e| + I^mI |by^e|) 
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where gy, j = 1, . . . ,4 are defined as in (4.28) and (4.31). It is easy to see from (4.28) 
and (4.31) tliat |1 — L>| = 0(1) and \gij\ = 0{l/nqns) for all j = 1, ... ,4. Tlierefore it 
suffices to sfiow tfiat ||b(7^||oo = 0{^/n^) and tliat |b^_^e| and |b^^e| are bounded above by 
a scalar multiple of risy/fh} witli probability tending exponentially to 1 as Uq approaches to 
oo. 

We begin by deriving the necessary upper bound on |b^ e|. Recall that 



Taking the inner product of b^ with e yields 



2mn 



[nis + Tsns){e^Wu„,ve - amqUq) + jr^{mq + Tsnq){e^Wu,ye - anisn 



2ms 

-Ts{e^Wn,v,e - PrnqUq) - {e^Wu.ye - iSmsHq). 



(4.79) 



Recall that le'^lVf/ ye — am„n„| < rrin^/n^ and le^Wn^.v.G — am.,n.J < rris^/n^ with 



iiqibq 



probability at least 1 — 2pi by (4.77), where pi := 2exp(— 2m). Applying (4.7) with 
S = e^Wjj y^e and t = m„.Jn^ shows that 



(4.80) 



with probability at least 1 — pi- Similarly, applying (4.7) with S = e^Wu^ye and t = 
nqy/rn^ yields 

\e^Wu,y,e - (3msnq\ < nqy/m^ (4.81) 



with probability at least 1 — 2exp(— 2n) =: 1 —p2- Applying the triangle inquality and the 



union bound to (4.79) shows that 



b^ el < cins^/fTq 



(4.82) 



for some scalar ci > with probability at least 1 — 3pi — p2- Similarly, there exists scalar 
C2 > such that |b^^e| < C2nsJfiq with probability at least 1 — 3pi — p2- 



We next derive an upper bound on ||b, 



Co 1 1 oo • 



By the triangle inequality. 



|bt/ Jloo < rxis^Xy^ -E[At/J||oo + \{Xv, -^[At/J)^e| - r,||iyc/,,y.e - /3n,e| 



(4.83) 



Recall that 



Xi — E[Xi 



^TqUq 



{e^Wu^y^e - auiqUq) 
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for all i G Uq. By KTG^ 



< n 



3/4 



with probability at least 1 — 2 exp(— 2n^/^). On the other hand, 

\e^Wu„y^e - amqUql < rnqy/uq 



with probability at least l—pi by (4.55). Applying the union bound over all i G Uq shows 
that 

\\\u, - E[\uMoo < -cm-'/' (4.84) 
for some scalar ci > 0, with probability at least 1 — pi — 2mq exp(— 2n^/^). Similarly, 

\i\u, - E[Xu^]fe\ < ^ \e'^Wu,ye - «m,n,| < (4.85) 



with probability at least 1 — pi by (4.55). Finally, (4.76) implies that 



J2 - /^^^ 



< n' 



3/4 



(4.86) 



for all i G Uq with probability at least 1 — 2mq exp 

(-2n^/2)_ Substituting (Ol, (Osl) 



and (4.86) into (4.83) shows that there exists scalar C2 > such that 



>c/J loo < C2nsng 



1/4 



with probability tending exponentially to 1 as n — )■ oo. Following a similar argument, one 
can show that ||by_j||oo and ||bcJ|oo are bounded above by scalar multiples of UsUq and 
riq'^ ^ respectively, with probability tending exponentially to 1 as fi, — oo. It follows that 
there exists scalar c > such that 

llylloo < (c/2)n-V4 (4_g7^ 

with probability tending exponentially to 1 as n — t- oo. By a similar argument, ||z||oo < 
{c/2)nq^^^ with probability tending exponentially to 1 as n approaches oo. Applying the 
union bound one last time completes the proof. ■ 

Lemmas 4.4 and 4.5| imply that A,0, and t] are nonnegative with probability tending 
exponentially to 1 as n tends to oo. Therefore, if the planted bicliques (f/i, Vi), . . . , (f/fc, Vk) 
satisfy (4.45) and (4.46) then the corresponding feasible solution Z* is optimal for ( |3.5 ) 
with probability tending exponentially to 1 as n — )■ oo by Theorem 4.5 The following 



theorem states that the uniqueness condition given by (4.47) is also satisfied with high 



probability by matrices W sampled from the planted bicluster model. 
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Theorem 4.6 For all s G {1, . . . ,k} such that q ^ s, 



n 



with probability tending exponentially to 1 as n ^ oo. 
Proof: Fix q s. Recall that 



with probability at least 1 — by (4.55). Moreover, 



with probability at least 1 — pi by (4.80). It follows immediately that 



^^^e Wu^v^e - riqe Wu^ye > rUqUqUsia - (3 - {n^ ' + / )) 

with probability at least 1 — 2pi. Applying the union bound over all g 7^ s completes the 
proof. ■ 



4.5 Positive semidefiniteness of S 



We have established that /ii,/i2,A,0, and as defined by (4.36), (4.37), (4.38), (4.39), and 



(4.40) satisfy the hypothesis of Theorem 4.5 with extremely high probability. Moreover, 



we have established that the uniqueness condition (4.47) is satisfied with high probability. 



Therefore, it suffices to show that 5* as defined by (4.1) satisfies (4.46) to prove that Z* 



is the unique optimal solution of (3.5). In particular, we will derive the following upper 
bound on the spectral norm of Si. 



Theorem 4.7 There exist scalars Ci,C2 > such that 

1/2 

+ C2VN + PTk+lUk+l 



\si\\ < ci (^y^.^i^ 



(4. 



with probability tending exponentially to 1 as n approaches 00. 



This theorem, along with Theorems 4^ and 4^ and Lemmas 4A and |4.5 establishes 



Theorem 3.1 Indeed, if the right-hand side of (4.88) is at most ^2(« — /3)n — ^siNrik+i 



vl/2 



and Ui < ^i{a — PYn^ for each i = 1, . . . ,k then Theorems 4.5 and 4.6 and Lemmas 4.5 and 



4.4 imply that the planted fc-disjoint-biclique subgraph is the maximum density A;-disjoint- 



biclique subgraph of Km,n with respect to W and can be recovered by solving (3.5). The 



remainder of this section consists of a proof of Theorem 4.7 We decompose Si as 

5*1 = S*! + 5*2 + 5*3 + 5*4. 
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where Si G S^^"*"^, i = 1, . . . ,4, are defined as follows. We take 

S(U f/ ) = / ^^^^ ~ E[Xu^])^^ + e{Xu^ - E[Xu^]f, if q ^ s 
1' I 0, otherwise 

c / (</'v, - i^[0yj)e^ + e(0v. - i?[</>yj)^, if g 7^ 

^' ''^ \ 0, otherwise 

and set all remaining entries of Si to be 0. We choose 

for all g 7^ s and S2{Cq, Cq) = for all g G {1, . . . , A; + 1}. Next, let 

d (jj v\-l ^^^^ ~ ^'''^ ^iq = s, qe{l,...,k} 
^3{^g,ys) I ^eeT_p^^^^^^^ otherwise, 

where R^'"^ is a rriq x Uq random matrix with independent identically distributed (i.i.d.) 
entries sampled according to ^2, the distribution of the off-diagonal blocks of W. We 
choose S^iVq, Us) = S^^Us, Vq)^ and set all other entries of S3 equal to 0. Finally 5*4 is the 
correction matrix for the diagonal blocks of 5*3. That is, 

S^iUq, Vq) = W''^ - /3ee^, S^{Vq, Uq) = S,{Uq, Vq)^, 

for all g = 1, . . . , /c, 

S4{Uk+i,Vk+i) = S,{n+i,Uk+i)^ = -/3ee^, 
and all remaining entries of 6*4 are 0. Note that 

ll^sll = \\S3{U,V)\\<csV^ 
for some scalar C3 with probability tending exponentially to 1 as n —t- 00 by the block 



structure of 5*3 and Theorem 4.2 Similarly 



115411 = max <^ C4 max y/nl, /Sy/ruk+ink+i \ < c^y/N + /STk+iUk+i 

I i=l,...,k J 

for some scalar £4 > with probability tending exponentially to 1 as —t- 00 by Theo- 
rem 



4.2 Therefore, there exists scalar C2 > such that 

ll'S'ill < ll^i + 5*211 + C2VN + PTk+lTLk+l 

with probability exponentially close to 1 in h. The fact that ||5i-|-5'2|p = 0{k{ni-\ hn^)) 

is a consequence of the following theorem, which provides an upper bound on the norm of 
5i(C„ Cs) + 52(C„ Cs) for all q ^ s. 
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Theorem 4.8 There exists scalar c > 



\Si{Cg,Cs) + S2iCg,Cs)\\ < cJmax{ng,ns} 



(4.89) 



for all q,s G {1, . . . , k + 1} , q s, with probability tending exponentially to 1 as h ap- 
proaches oo. 



Proof: Fix q,s G {1, . . . ,k} such that q ^ s. The proof for when g or s is equal to 
k + 1 follows a similar argument. Without loss of generality, we assume that Ug < rig. 



To prove that (4.89) holds with high probability, it suffices to show that ||S'i(Cg, Cs)|| and 
||5'2(Cg, Cs)|| are bounded above by a scalar multiple of ^JfTg with high probability. Recall 
that there exist scalar ci , C2 such that 



\Xu.-E[XuM<ci, 



-EUv^\\<C2 



for alH G {1, . . . , k} with probability tending exponentially to 1 as ri — ?■ oo by (4.56) and 



(4.57). It follows that 



\\Si{Cg,Cs)\\<~ci^s (4.90) 
for some scalar ci > with probability tending exponentially to 1 as n — )■ cxd. 

It remains to obtain an upper bound on ||52(Cg, Cs)\\. By the triangle equality, we have 



1 



|y|| < — llbcjl + , , . 

OJs ^s[^s +rq) 



+ 



l-D 



l^ii||b[/,e| + |^i2||by^e| + |^i3||bc;^e| + l^ul |by^e|) . 



(4.91) 



By (4.82) and the fact that each \gu\ is bounded above by a scalar multiple of l/ngUs, the 



last two summands on the right-hand side of (4.91) are bounded above by a scalar with 



probability tending exponentially to 1 as ri — ?■ oo. On the other hand. 



< mJIA 



E[Xi 



mg\{Xu, - E[Xu,]) e| + Ts\\Wu^y^e - /Jn^el 



with probability tending exponentially to 1 as n — )■ oo by (4.56) and (4.85). Theorem 4.2 
shows that there exists scalar ip such that 



\W, 



e — PngsW < (fUs 



with probability tending exponentially to 1 as n — )■ oo. It follows that there exists scalar 
ci > such that 

||bc/J| < ciUs. 

with probability tending exponentially to 1 as n approaches oo. By an identical argument, 
W^VqW ^ C2ns with probability tending exponentially to 1 as n — )■ oo for some scalar C2 > 0. 
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Therefore, there exists scalar C3 > such that 

lly|| < C3 

with probabihty tending exponentially to 1 as n — 00. Similarly, one can show that there 
exist scalars £4,65 > such that ||bcj| < c^^UqUg and, consequently 

\ 1/2 



l|z|| < C5 

with probability tending exponentially to 1 as n approaches 00. Applying the triangle 
inequality and the union bound shows that 

for some scalar £2 > 0, with probability tending exponentially to 1 as n tends to 00. 
Applying the union bound over all g 7^ s completes the proof. ■ 



To complete the proof of Theorem 4/7 note that 

k k 



q=l s=l 



by the triangle inequality. Applying Theorem |4.8| shows that there exists scalar c such that 



ll'S'i + 5*211^ < cmaxjnq, n^^ < '^^i^q + ^s) < 2cA; 

q=l s=l g=l s=l 5=1 

with probability tending exponentially to 1 as n approaches 00. 
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A Appendix: Proof of Theorem |2.1 



A.l Optimality conditions and choice of multipliers 



The proof of Theorem 2.1 is similar to that of Theorem 3.1, ahhough with modifications 



made to exploit the symmetry of the weight matrix W. As before, the proof of Theorem |2.1 
relies on showing that a proposed optimal solution satisfies the sufficient conditions for 
optimality given by the Karush-Kuhn- Tucker Theorem. The following theorem provides 



the necessary specialization of these optimality conditions to (2.6) 



Theorem A.l Let X be feasible for (2.6) and suppose that there exist some fi > 0, \ E 
R^, T] G R^"^^ and S e i:^ such that 



-W + Xe' +eX^ -r] + fil = S 



Then X is optimal for (2.6). 



A^(Xe 







Tr {Xr]) = 
Tr (XS) = 0. 



(A.l) 
(A.2) 
(A.3) 
(A.4) 



Let K* be a fc-disjoint-clique subgraph of Kn with vertex set composed of the dis- 
joint cliques Ci, . . . ,Ck of sizes ri, . . . , and let X* be the corresponding feasible solu- 
tion of (|2]6| defined by 1^. Let Ck+i := V \ (U^L^Q) and r^+i := - ^JL^ r^. Let 
f := min,=i ^rj. Let W G be a random symmetric matrix with entries distributed 
[ui) and (^2)- To show that X* is optimal for (2.6), we will construct mul- 

and ^ G satisfying (lAli (M, (IXsl), and dXl. 



according to 
tiphers /i > 0, A G R^, 



?7 G R 



Note that the gradient equation (A.l) provides an explicit formula for the multiplier S for 



any choice of multipliers fx, A, and 77. 

We construct the multipliers A, rj, and 5* in blocks indexed by the vertex sets Ci, . . . , Ck+i- 
The complementary slackness condition (A.4) implies that the columns of X are in the 
nuUspace of S since Tr (XS) = if and only if XS = for all positive semidefinite 
X, S. Since X^, is a multiple of the all ones matrix ee-^ for each q — 1, . . . ,k, and all 
other entries of X* are equal to 0, the condition (A.4) implies that every block Scq,Csi 
G {1, . . . , A;}, must have row and column sums equal to 0. Moreover, since all entries 

for all g = 1, 



of X 



k by (A.3). 



C7„c, are nonzero, r^^^c. 
For each q G {l,...,/c}, the condition Scq,Cq^ = is satisfied if 

= Sc,,c,G = /ie + TqXcq + (A^ e)e - Wc,,c,e 



for all g = 1, . . . , A;. Rearranging (A. 5) shows that Ac, is the solution to the system 

{Tql + ee^)Ac, = Wc„c,G - /^e 



(A.5) 



(A.6) 
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for all g = 1, . . . , fc. Applying (4.11) with A = Vgl, U = V = e shows that choosing 



,,.i(,.,,,e-l(..!!l5^)e) ,A.r) 

ensures that Tr {Scq,Cq^Cq,Cq) = for all g = 1, . . . , fc. 

We next construct rj. Fix g, s G {l,...,k + 1} such that q ^ s. To ensure that 
Scq,Cs^ = Sc^,CqG = 0, we parametrize the entries of rjCq^Cs using the vectors y'^''' 
and z'''*. In particular, we take 

VCq,C, = - ^) + - ^) - ^) + y"'"'" + ^^-8) 

Here 6ij := 1 — where 6ij is the Kronecker delta function defined by 6ij = 1 ii i = j and 
otherwise. That is, we take ?7c,,Cs to be the expected value of XcqG^ + eA^^ — Wcq,Cs: 
plus the parametrizing terms y'^^'^e^ and e(z'^''')-^. The vectors y'^'* and z'^''^ are chosen to be 
the solutions to the systems of linear equations imposed by the requirement that X*S = 
SX* = 0. As before, this system of linear equations is a perturbation of a linear system 
with known solution. Using the solution of the perturbed system we obtain bounds on y*^'^ 
and z^'**, which are used to establish that r] is nonnegative and S is positive semi definite. 

Let 

VCq,Cs ■■= ^Cq^^ + eAj^ - Wcq^cs- (A.9) 

Note that the symmetry of W implies that fjc^^Cq = VcqCs- ^ ~ ^'^''^ ^ JipqUCs 
defined by 

be, = VCq,CsG - E[ficq,c^]e, (A.IO) 

be. = VCs,Cqe - ^fe„cje. (A.ll) 

We choose y = y''''^ and z = z^'* to be solutions of the system 

r-sl + eee^ (1 - 9)ee^ \ f y \ _ ^ ^^^2) 



1 - 6)ee^ rgl + 9ee^ J \ 

for some scalar > to be defined later. The requirement that the row sums of Scq,c's ^i-re 
equal to zero is equivalent to y and z satisfying the system of linear equations 

= -r^y—z^e + - ^|^(arq - /i)^ + (^A^^e - ^^^{ar^ - fx) 

-i[Wcq,cM^-rsP) (A.13) 
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for all i G Cq. Similarly, the column sums of Sc^^Cs ^'^^ equal to zero if and only if y and 
z satisfy 



= -TqTsi-y e + Ai - 



-^{ars-y) 1 + I Ac^e —{arq-y) 



(A. 14) 



for all i E Cg. Note that the system of equations defined by (A. 13) and (A. 14) is equivalent 



to (A. 12) in the special case that ^ = 0. However, when = 0, the system of equations in 
( A.12[ ) is singular, with nuUspace spanned by the vector (e; — e). It follows that (y + ce; z — 
ce) is a solution of (A. 12) for any scalar c if (y; z) is a solution of (A. 12). In particular. 



there exists solution (y; z) of (A. 12) such that 



0. 



(A.15) 



When 9 is nonzero, each row of the system (A. 12) has an additional term of the form 
9{e^y — e^z). Therefore, for ^ > such that (A. 12) is nonsingular, the solution (y; z) 



satisfying (A. 13), (A. 14), and (A.15) is also the unique solution to (A. 12) since the term 
O^e^y — z) is zero. In particular, note that (A. 12) is nonsingular for 9 = 1. For this 



choice of 6', y and z are the unique solutions of the systems 

(r,J + ee^)y = bi 
(r„/ + ee^)z = b2 



(A.16) 
(A. 17) 



where bi := h{Cq) and b2 := b(Cs). Applying the Sherman-Morrison- Woodbury formula 
(4.11 ) with A = Tsl, U = V = e and A = Vgl, u = v = e yields 



1 



1 



bi 
b2 



rq + Ts 

(b^e) 

Tn + rs 



(A.18) 
(A.19) 



respectively. 

In summary, we choose the multipliers G R, A G R^, f] G R^^^ as follows: 

e(a — f3)r 



A 



Co 



1 ( e^Wc c e 



1 

Tq 
0, 



if g G {l,...,fc} 

i{q = k + l 



(A.20) 
(A.21) 



E[flc,.c:\ + y^'^'e^ + e(z'''^)^, if g, s G {1, . . . , + 1}, g ^ s 
0, otherwise 



(A.22) 
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where e > is a scalar to be defined later, rjc„fi, is defined as in (A. 9), and y"^' 



are 



given by (A. 18) and (A. 19) for all g, s G {1, . . . , + 1} such that q ^ s. We choose S 
according to (A.l). Finally, we define the (fc + 1) x (A; + 1) block matrix S in by 



Sr 




fe+i ' 



if g, s e {1, . . . + l},g 7^ s 
ifg = s = fc + l 
otherwise. 



(A.23) 



We conclude by providing the following theorem, which provides a sufficient condition 



for when the proposed solution X* is the unique optimal solution for (2.6) and K* is the 



unique maximum density fc-disjoint-clique subgraph of Kn corresponding to W . 

Theorem A. 2 Suppose that the vertex sets Ci, . . . ,Ck define a k- disjoint- clique subgraph 
K* of the complete graph Kn = iV.E) on N vertices and let Ck+i := V \ (ujLj^Cj). Let 
Ti := \Ci\ for all i = 1, . . . ,k + 1, and let r = minj=i^...^fc rj. Let W G be a random 
symmetric matrix sampled from the planted cluster model according to distributions fli, fl2 



with means a,f3 satisfying (2.8). Let X* be the feasible solution for (2.6) corresponding to 
Ci,...,Ck defined by (pTl). Let fi>0, Xe R^, r] G R^><^ be chosen according to ^A^, 



(A. 21), and (A. 22), and let S be chosen according to (A.l). Suppose that the entries of X 



and Tj are nonnegative. Then there exist scalars ci,C2 > such that if 



2-2 



Ti < ci{a - (3) f 



for all i = 1, . . . ,k, and 



\S\\ < C2(a — f3)r 



(A.24) 



(A.25) 



then X* is optimal for (|2.6|), and K* is the maximum density k- disjoint- clique subgraph of 

(A.26) 



Kn corresponding to W . Moreover, if 



for all q,s G {1, . . . ,k} such that q s, then X* is the unique optimal solution of (2.6) 
and K* is the unique maximum density k- disjoint- clique subgraph of K^- 



Proof: By construction, /i. A, ?7, and 5* satisfy (A.l), (A. 2), (A. 3), and (A. 4). Moreover 



/i. A, and r] are nonnegative by assumption. Therefore, to prove that X* is optimal for 
(2.6), it suffices to show that S is positive semidefinite. To do so, we fix x G and 
decompose x as x = xi + X2 where 



xi(a) 



if z G {1, . . . , A;} 
if z = A; + 1 
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for some G R'^ chosen such that X2(Cj) is orthogonal to e for all i = 
X2(Ca:+i) = x(Cfc+i). By our choice of xi and X2, we have 

x^Sx. = X^5'X2 

= ^(X2(a)^(aee^ - Wc^^cJMQ) + x^S + /i/)x2 



k, and 



i=l 



> /i - max llaee - Wc„c, 
i=i,...,fc 



> 



a — (3 



-r — 7 max 
i=i,...,fc 



151 



151 



^2 



X2 



for some scalar 7 > with probability tending exponentially to 1 as f —t- 00, since there 
exists scalar 7 > such that 

llaee'^ - Wc„cM < iV^i- 



with probability tending exponentially to 1 as f —t- 00 by Theorem 4.3 Therefore, there 
exists scalars Ci,C2 such that if rj < ci(a — /3)^f^ and II^II < 02(0 — f3)f, then x-^S'x > 
for all X G with equality if and only if X2 = with probability tending exponentially 
to 1 as f — )■ 00. Therefore X* is optimal for (2.6) with probability tending exponentially 
to 1 as f — )■ 00. Moreover, Vj is in the nuUspace of S for alH = 1, . . . , A; by (A.4) and the 
fact that X* = Yl!i=i^'i^I l^i- Since x-^S'x = if and only if X2 = 0, the nuUspace of S is 
exactly equal to the span of {vi, . . . , v^} and S has rank equal to N — k. 

To see that X* is the unique optimal solution for (2.6) if Assumption ( A.26[ ) holds, 
suppose, on the contrary, that X is also optimal for (2.6). By (A.4), we have Tr (XS) = 0, 
which holds if and only if XS = 0. Therefore, the row and column spaces of X lie in the 
nuUspace of S. Since X ^ and X > 0, we may write X as 



X 



k k 

EE 

1=1 j=i 



(A.27) 



for some a G R^^'^. The fact that X satisfies Xe < e implies that 



^qq^q + X] o'qs'^s < 1 



(A.28) 



s=l 



for all g =, 1 . 
such that 



k. Moreover, since Tr {WX) = Tr (WX*), there exists some q E {I, . . . , k} 

k 



(A.29) 
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Combining (A. 28) and (A. 29) shows that 

1 ^' 



< W^rq 



^qs^ s 



s^q 



1 



+ aqs^^gW^^s 

. s=l 
/ s^q 



^r^W^rq 



s^q 



contradicting Assumption (A. 26). Therefore, X* is the unique optimal solution of (2.6) as 
required. ■ 



A. 2 Nonnegativity of A and rj in the planted case 



Let Ci, . . . ,Ck denote the vertex sets of a fc-disjoint-clique subgraph of the complete graph 
Kn = {V,E) on vertices. Let Ck+i := V \ {U^=iCi) and let n := \Ci\ for all i = 
1, . . . , k + 1. Let f := min{ri, . . . , Vk}. Let W G be a random symmetric matrix 
sampled from the planted cluster model according to distributions Qi, Q2 with means a, P 



satisfying (2.8). Let /x, A, 77 be chosen as in (A. 20), (A. 21), and (A. 22) respectively. We now 



establish that the entries of A and rj are nonnegative with probability tending exponentially 
to 1 as f approaches 00. 

We begin by deriving lower bounds on the entries of rj. To show that rjij > for all 
i,j & V with high probability, we will use the following lemma, which provides an upper 
bound on ||y'^''*||oo and ||z'^'''||oo for all g, s G {1, . . . , + 1} such that q s, holding with 
probability tending to 1 as f tends to 00. 



Lemma A.l There exists scalar c > such that 

||y'''1loo + ||z''''||oo<£r-i/^ 
for all q, s & {1, . . . , k + 1} such that q ^ s with probability at least 

1 -4:{k + lf (7exp(-f) + A^exp(-2r^/2)) . 



(A.30) 



(A.31) 



Proof: Fix g, s G {1, . . . , fc} such that q ^ s. The proof for the case when either g or s is 
equal to + 1 is analogous. We first obtain an upper bound on ||y||oo = ||y^''*||oo- By the 
triangle inequality, we have 



< 



1 



bi 



|b[e| 

ra + Ts 



< 



1 



|bi| 



|b[e| 

r„ + r. 



(A.32) 



Hence, to obtain an upper bound on ||y||oo5 it suffices to obtain bounds on ||bi||oo and 
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|bfe|. We begin with ||bi||oo- Recall that we have 

b, = r, (x, - ^{ar, - + (^X^e - ^{ar, - /i)) - f I] - /3r J . (A.33) 
for each i E Cg. Note that 



2r., 



t = shows that 



Applying Theorem 4.4 with ^ = Tr(Vrc'.,cJ, t = rf^^ and ^ = Eiec. Eiec.,i>i ^^i' 



1 

2^ 



< 



2r, 



|Tr(Vr( 



arJ + 2 



V 



ieCs JSCs 

j>i 



ars{rs - 1) 



with probability at least 

1 - 2 exp(-2r,2) - 2 exp(-r,V(rs - 1)) > 1 - Pi 

where 

pi := 2exp(— 2f^) + 2exp(— f). 
Next, applying (4.7) with S = Xl^ec^ ^^'^ ^ ~ ^^^^ shows that 



< r 



3/4 



with probability at least 1 — P2 where 

p2 := 2exp(-2f^/2). 



Finally, applying (4.7) with 5 = Xlfec ^ ~ ''^^'^'^ ^^"^ (A. 34) shows that 



(A.34) 



(A.35) 
(A.36) 

(A.37) 



(A.38) 



(A.39) 



with probability at least 1 — pi — p2- Combining (A.34), (A.37) and (A.39) and applying 
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the union bound shows that 



with probabihty at least 1 — pi — 2rqp2. By a similar argument, 



(A.40) 
(A.41) 



with probability at least 1 — pi — 2rsP2. 

We next obtain an upper bound on |bfe| and |b^e|. We have 

bfe = r, ( e - har^ - fi)) +r, ( A^^e - har, - /i) ) + (/3r,r, - e'^iy^.^e). (A.42) 



By (A.34) and the union bound, we have 



(A.43) 
(A.44) 



with probability at least 1 — 2pi. Moreover, applying Theorem 4.4 with S = e Wc ,Cs^ 



and t = rqy/r^ shows that 



with probability at least I — ps where 

P3 := 2 exp(— 2f) 



(A.45) 
(A.46) 



Substituting ( |A.43| ), ( |A.44D , and ( |A.45[ ) into ( |A.42[ ), we have 

|bfe| < Srsy/Fq 



(A.47) 

for some scalar C3 > with probability at least 1 — 2pi — ^3 by the union bound. Similarly, 

(A.48) 



|bfe| < 3rs^/rg 



with probability at least 1 — 2pi — pz- Substituting (A.40) and (A.47) in (A. 32) yields 

(A.49) 



||y||oo < cir^ 
for some scalar ci > with probability at least 

1 - (3pi + 2rsP2 +P3)- 
Similarly, there exists scalar 62 > such that 



(A.50) 



< C2r, 



-1/4 



(A.51) 
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with probability at least (A. 50). Combining (A. 49) and (A. 51) and applying the union 



bound over all q,s completes the proof. 



As an immediate consequence of Lemma A.l we have the following corollary that 
states that t] is nonnegative with probability tending exponentially to 1 for sufficiently 
large values of f. 



Corollary A.l Suppose that a, /3 satisfy (2.8). Then the entries of the matrix rj are 



nonnegative with probability tending exponentially to 1 as r approaches oo. 

Proof: Fix i G Cq, j G Cs for some g, s G {1, . . . , /c + 1} such that q s. Recall that 



'nc„,Cs 



q,k+l I f'\ . (^s,k+l I ^^ 

'a 1 H — I a 



r,, 



r. 



/3 ) ee^ + y'^'^e^ + e(z 



q,s\T 



Therefore, li a > (3 and r^+i = or a > 2/3 and rfc_|_i ^ 0, Lemma ( |A.1[ ) implies that 



oo oo 



> ^(2 - 5q^k+i + 5s,fc+i) - |t(2 - 5g,fc+i + 5s,k+i) - P - cf 

= |(2 - 6q,k+i + Ss,k+i) -P- ^^^^(2 - Kk+i + - £r-i/^ > 0, 



for all sufficiently small e > and sufficiently large f with probability at least (A. 31) 



The following theorem provides a lower bound on the entries of Ac„ for all g = 1, . . . , fc. 



Theorem A. 3 There exist scalars ci,C2 > such that 

X^ > r(ci - C2r-i/^) 
for all i ^ V \ Ck+i with probability at least 

1 - Akexp{-f) - 2A^exp(-2f^/^). 



(A.52) 



(A.53) 



Proof: Fix g G {1, . . . ,k} and i G Cq. Recall that 



Applying (4.7) with S = ^j^c ^^"-^ ^ ~ ''^9^'^ yields 



3/4 



(A.54) 
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with probability at least 1 — P2- Moreover, (A.44) implies that 

1 



2r, 



-e W^c„,c„e < -(ar„ + Jr^ 



(A.55) 



with probability at least 1 — pi. Combining (A. 54) and (A.55) and applying the union 



bound shows that there exist scalars Ci , C2 > such that 



with probability at least l—pi—p2 for sufficiently small choice of e > in (A. 20). Applying 
the union bound over all i E V \ Cfc+i completes the proof. ■ 



Note that Theorem A.3| implies that A > with probability tending exponentially to 1 
as f tends to oo. Therefore, fi, A, r] constructed according to (A. 20), (A. 21), and (A. 22) are 



dual feasible for (2.6) with probability tending exponentially to 1 as f — )■ oo. The following 



theorem states the uniqueness condition given by (A. 26) is satisfied with high probability 
for sufficiently large f. 

Theorem A. 4 Iff > 9/(a — then 

for all g, s G {1, . . . , /c} such that s with probability at least 

1 — 6/c^ exp(— f). 

Proof: Fix g 7^ s such that Vg < Vg. Recall that 

e^Wc,,ce>arl-2rf 



(A.56) 



(A.57) 



with probability at least 1 — pi by (A. 34). Similarly, 



(A.58) 



with probability at least 1 — ps by (A. 45). Combining (A.57) and (A.58) yields 



Tse^Wcce - r^e^WcCse > r^gia -(3- 2r-^^'^ - r-^'^) 

> r,rl{a -(3- Sf^^/^) > 

if f > 9/(a — with probability at least 1 — pi — ps. Applying the union bound over all 
choices of g, s completes the proof. ■ 



We have shown that /i. A,?] constructed according to (A. 20), (A. 21), and (A. 22) are 



dual feasible for (2.6) and the uniqueness condition (A. 26) is satisfied with probability 



tending exponentially to 1 as f ^ 00. In the next subsection, we derive an upper bound 
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on the norm of S and use this bound to obtain conditions ensuring dual feasibihty of S 
and, hence, optimahty of X* for (2.6). 



A. 3 An upper bound on \\S\\ 

Suppose that the random matrix W is sampled from the planted cluster model correspond- 
ing to partition Ci, . . . , Ck+i of the vertices of the complete graph = (V, E) on N = \V\ 
vertices according to distributions ^21,^2 with means a,/3 satisfying (2.8). 

for alH = 1, . . . , + 1. Let f := minj =i ^ Vj. Let /i G R+, A G R^, 

S G be defined as in Section A.l 



Let Tj = \Ci\ 



i,...,fcri. Let /i G R+, A G ±1^', 
In this section, we derive an upper bound on \\S\\, 
which will be used to verify that the conditions on the partition Ci, . . . , Ck+i imposed by 
(2.9) and (2.10) ensure that the /c-disjoint-chque subgraph of Kj^ composed of the chques 
Ci, . . . ,Ck is the unique maximum density fc-disjoint-clique of with respect to W and 
can be recovered by solving (2.6) with probability tending exponentially to 1 as f — t- oo. 



In particular, we will prove the following theorem. 



Theorem A. 5 There exist scalars pi, P2 > such that 

/ fc+i \ 

ll^ll <Pi W'Jlj^f +p2VN + Prk+i 



(A.59) 



with probability tending exponentially to 1 as f approaches oo. 



rem 



This theorem, along with Theorems A. 2 and A. 3, and Corollary A.l, establishes Theo- 
2.1 Indeed, if the right-hand side of (A.59) is at most C2(a — /3)f and < ci(a — /9)f^ 



for each i = 1, . . . ,k then Theorems A. 2 and A. 3 , and Corollary A.l imply that the 
fc-disjoint-clique subgraph given by Ci,...,Ck is the densest A;-disjoint-clique subgraph 
corresponding to W and can be recovered by solving (2.6). 



The remainder of this section consists of a proof of Theorem A. 5 We decompose S as 

5* = 5*1 + 5*2 



where Si G S 



N 



S2{Cq, Cs) 
SsiCq, Cs) 
SniCq, Cs) 



+ 5*4 

4, are (fc + 1) by (A; + 1) block matrices such that 

if g,sG {l,...,fc + l}, g^^s 
0, otherwise 

/3ee^ -W, iiq = s = k + l 

(3ee^ — R{Cq, Cs), otherwise 

0, i{q = s = k + l 

R{Cq, Cs) — l3ee^, otherwise 

— /3ee^, ifg = s = A; + l 
0, otherwise 
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where R G is a random symmetric random matrix with independent identically dis- 
tributed (i.i.d.) entries sampled according to VL2. By Theorem 4.3, there exist p2, i^i-, ^2 > 
such that 

II 52 II + II 53 II < paViV (A.60) 

with probability at least 

1 - Ki exp(-K2A^~^/^). (A.61) 

Morever, we have 

IIS4II =/3||ee^|| =/3r,+i. (A.62) 

The fact that 

/ fc+i 
\\~Si\? = 0{kY,rs 

\ s=l 

with probability tending exponentially to 1 as f — cxd is an immediate consequence of the 
the following theorem, which provides an upper bound on the norm of S{Cq, Cs) holding 
with probability tending exponentially to 1 as r approaches 00. 



Theorem A. 6 There exists t > 

\\S,iC„CM = ||5(C,,C,)|| <t^max{r„rj (A.63) 

for all q, s E {1, . . . , k+1}, q ^ s, with probability tending exponentially tolasf approaches 
00. 



Proof: 

We consider g, s G {l,...,fc} such that q ^ s. The derivation of the bound on 
||S'i(Cq, Cs)|| for the case that g = /c + lors = A; + lis analogous. Without loss of 
generality we may assume that < r^. We decompose Scq,Cs as Scq,Cs = + ^2 + M3 
where 



Ml = Ac 



^(arg-p)e^ e^ 



1 



M2 = e I^Ac, - ^("'^s - 
M3 = ye'^ + ez^. 
We first obtain a bound on the norm of Mi. Recall that 

Ac, = — Wc„c,e - - /i + 
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by (A.7). Rearranging, we have 

1 



Ml 



1 

2^ 



[arg - fi) 



' q ^1 q 



Note that we have 



avg ^-^^ I ee 



(AM) 



(A.65) 



with probabihty at least (A. 35) by (A. 34). On the other hand, applying Theorem 4.3 shows 
that there exists scalars (pi,ip2 > such that 



(A.66) 



with probability at least 1 — exp(— (/?2'^^^^)- Substituting (A.66) and (A.65) into (A. 64) and 
applying the union bound, we have 

llMill < (<^i + l)y^ (A.67) 

with probability at least 1 — pi — p^. Similarly, we have 

\\M2\\<{^i + l)^s (A.68) 

with probability at least 1 — pi — pi- 
It remains to obtain an upper bound on IIM3II. Applying the triangle inequality, we 
have 

IIM3II < v^llyll + V^||z||. (A.69) 
We begin by obtaining a bound on ||y||. Note that there exists scalar c > such that 



|y|| < 



1 



|bi| 



|bfe| 

Tn + Ts 



fTg 1 < 



1 



|bi| 



CTsTq 

Tn + r. 



(A.70) 



with probability at least 1 — d>pi by the triangle inequality and (A.47). We next obtain a 
bound on ||bi||. Recall that 



|bi|| < Ts 



1 

2^ 



+ \\Wnce-f3rse\\. {A.71] 



Note that (A.67) implies that 



Ac, - — (arg-/i)e 



v^llMill < (<^i + l)r. 



(A.72) 
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with probability at least 1 — pi — Pa- Next, the scalars (pi,(p2 niay be chosen so that 

\\Wc„ce - /3r,ef < \\Wc,,Cs ' /^ee^ll l|e|| < V^ir, (A.73) 

with probability at least 1 — p4 by Theorem 4.3 Therefore, substituting (A. 43), (A. 72), 
(A.73) in (A. 70) shows that there exists scalar ti > such that 

||bi||<tir, (A.74) 



with probability at least 1 — pi — 2p4 by the union bound. Substituting (A.74) in (A. 70) 
yields 

||y|| <ti + c (A.75) 

with probability at least 1 — 2(2pi +Pi) by the union bound. Similarly, there exists ^2 > 
such that 

/r7 



Z < to 



(A.76) 



with probability at least 1 — 2(2pi + p^). Substituting (A.75) and (A.76) in (A. 69) and 
applying the union bound shows that 



IM3II < {h+t2 + c)^s 



(A. 77) 



with probability at least 1 — 4(2pi +P4). Finally, combining (A. 67), (A. 68), and (A. 77) 
shows that there exists scalar t > such that 



(A.78) 



with probability at least 1 — 4(2pi +^4) as required. 
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